An Adaptable Search System for Collections of Partially Structured Documents

نویسنده

  • Udo Kruschwitz
چکیده

need to understand the documents it is accessing. But what if the document collections you want to search are domain-specific or limited in size? This type of data source is everywhere, from corporate intranets to local Web sites. Wouldn’t it be useful to have a simple dialogue system that knows what data is available and can assist users in the search process? Furthermore, shouldn’t such a system be portable enough to be run on a completely different collection without much hassle? Here, I present such a search system, based on a generic framework that incorporates a simple domain-independent dialogue manager and an automatically created domain model. I constructed the model by exploiting the markup structure in documents and offer two different domains for which users can construct similar models rapidly, applicable without customization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward Structured Retrieval in Semi-structured Information Spaces

A semi-structured information space consists of multiple collections of textual documents containing fielded or tagged sections. The space can be highly heterogeneous, because each collection has its own schema, and there are no enforced keys or formats for data items across collections. Thus, structured methods like SQL cannot be easily employed, and users often must make do with only full-tex...

متن کامل

MatchDetectReveal: finding overlapping and similar digital documents

The Internet provides easy access to large collections of semi-structured digital documents. WWW browsers, search engines and the "cut & paste" technique are tempting to substitute one's creativity by simple compilation from appropriate digital resources. This paper discusses the problems of detecting plagiarism in large collections of semi-structured electronic texts. Overlaps in and similarit...

متن کامل

An Exponentiation Method for XML Element Retrieval

XML document is now widely used for modelling and storing structured documents. The structure is very rich and carries important information about contents and their relationships, for example, e-Commerce. XML data-centric collections require query terms allowing users to specify constraints on the document structure; mapping structure queries and assigning the weight are significant for the se...

متن کامل

Schema Independent Retrieval from Heterogeneous Structured Text

We present a query language for searching collections of structured text Documents within the collection are not required to adhere to a global schema nor are individual documents required to be structured according to any de ned schema at all Nonetheless queries may directly reference structure across di er ently formatted documents We brie y discuss appli cation of the language to multilingua...

متن کامل

Cooperating Peers for Content-Oriented XML-Retrieval

Semi-structured documents formatted with the extensible markup language (XML) are gaining wide use by a whole range of applications including E-Commerce, E-Business, EScience, Digital Libraries (DL), File Sharing, and in the last years especially by applications for Peer-to-Peer (P2P) systems. P2P architectures have been identified as an efficient means of ad-hoc collaboration and information s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Intelligent Systems

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2003